Skip to content

Conversation

@anschaible
Copy link
Collaborator

Minimal version of the gradient is also working on the sharded pipeline.

Here we calculate the gradient with respect to age and metallicity for two identical particles.

Next step for the gradient is to include the distribution function and calculate a gradient with respect to the parameters of the distribution function.

The only problem here: If I run the notebook gradient_age_metallicity_adamoptimizer.ipynb on 2 cpus (e.g. my local laptop) I get good looking results. If I run the exact same notebook 2 gpus I get very different and bad looking results. I did not find out yet, what is going wrong.

@anschaible
Copy link
Collaborator Author

anschaible commented Jul 2, 2025

gradient_mac

Bildschirmfoto 2025-07-02 um 16 25 26

At the moment the gradient is woring on CPUs (we tested on Tobias and my laptop). However, if we run the same thing on GPUs we get a very not looking good result for the loss landscape and the parameter optimization does not find the ground truth.

@anschaible anschaible requested review from TobiBu and ufuk-cakir July 2, 2025 14:51
"# NBVAL_SKIP\n",
"import os\n",
"#os.environ['SPS_HOME'] = '/mnt/storage/annalena_data/sps_fsps'\n",
"#os.environ['SPS_HOME'] = '/home/annalena/sps_fsps'\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a rule, don't hardcode paths in your home dir or with your name in it. use pathlib.Path.home() for instance.

"\n",
"def loss_only_wrt_age_metallicity(age, metallicity, base_data, target):\n",
" \n",
" base_data.stars.age = age*20\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a mutating operation, and as such will not play well with jax.grad, i.e., the result will be undefined. This might possibly be the problem with the broken gradient on the gpu. However, I was not able to check this because the loss gives nan after a few optimization rounds every time. Is there a quick solution to this?

As a general rule, I don't think mutating the parameters in the loss function is the best way to do this. More long-term, maybe we should think about modifying the pipeline in such a way that parameters and input data are conceptually separated, such that we can write gradients more easily.... wasn't quite obvious that this would turn out this way...

@MaHaWo
Copy link
Collaborator

MaHaWo commented Jul 22, 2025

this PR is also far too big. can we maybe restructure it into multiple? do we need all the jupyter notebooks?

@TobiBu TobiBu merged commit 8544605 into main Nov 10, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants